Revert "data/aws: Switch to m4.large" #858

wking · 2018-12-10T20:38:01Z

This reverts commit a230376 (#765), now that we've bumped our t3.medium limits in the CI and dev accounts to cover our expected loads. Moving from m4.large to t3.medium also reduces memory from 8 GiB to 4 GiB, but after a recent run of end-to-end tests, the master consumption was:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7980        2263         794           8        4922        5259
Swap:             0           0           0

so 4 GiB should be sufficient. And it also matches our libvirt setup since e59513f (#785).

CC @abhinavdahiya, @crawford, @smarterclayton, @cuppett

This reverts commit a230376 (data/aws: Switch to m4.large, 2018-11-29, openshift#765), now that we've bumped our t3.medium limits in the CI and dev accounts to cover our expected loads. Moving from m4.large to t3.medium also reduces memory from 8 GiB [1] to 4 GiB [2], but after a recent run of end-to-end tests, the master consumption was: $ free -m total used free shared buff/cache available Mem: 7980 2263 794 8 4922 5259 Swap: 0 0 0 so 4 GiB should be sufficient. And it also matches our libvirt setup since e59513f (libvirt: Add Terraform variables for memory/CPU, bump master to 4GiB, 2018-12-04, openshift#785). [1]: https://aws.amazon.com/ec2/instance-types/#General_Purpose [2]: https://aws.amazon.com/ec2/instance-types/t3/#Product_Details

crawford · 2018-12-10T20:51:45Z

Can we run this several times to make sure that it reliably succeeds? (Requires reliable CI)

wking · 2018-12-10T21:14:13Z

Can we run this several times to make sure that it reliably succeeds? (Requires reliable CI)

Yup, no rush. And I think @smarterclayton was considering e2e reruns to test reliability across the board, although I agree that this change is especially likely to be flake-inducing. Once CI is reliably green for other PRs, I'll /test e2e-aws here a few times.

cgwalters · 2018-12-10T21:46:44Z

Unless there's a specific reason why not to, let's keep AWS and libvirt in sync in this aspect in the future. So maybe add a comment to each like # If changing the master memory, also change libvirt (and vice versa) or even just use a common TF variable?

wking · 2018-12-11T05:23:15Z

So maybe add a comment to each like # If changing the master memory, also change libvirt (and vice versa)...

I'm fine with or without this. Ideally, all platforms are under CI (while we currently have e2e tests for libvirt, they always fail because the cluster lacks a publicly routeable API), and that's good enough for me. Keeping the platforms similar is nice, but especially with libvirt, the goal is really different (short-term dev clusters on libvirt vs. the possibility of long-term, eventually production clusters on AWS and other providers). So I don't mind if our default provisioning diverges a bit either.

... or even just use a common TF variable?

I don't see a way to do this, because for a given CPU and memory target, there are quite a few AWS instance types which match, and they all have different minor properties (burstable or not, etc.) which are not captures in the libvirt parameters. I'm very committed to having a shared installer core targeting Kubernetes abstractions that is platform-agnostic, but once we get out into the cluster API and installer-provisioned infrastructure, I'm happy to let the individual providers go their own way.

In other news, here's one successful e2e-aws. Let's try for another:

/test e2e-aws

wking · 2018-12-11T07:04:58Z

And another successful e2e-aws. Checking again:

/test e2e-aws

wking · 2018-12-11T08:28:19Z

And another successful e2e-aws. @crawford, how many of these did you want? ;)

crawford · 2018-12-11T15:15:34Z

I don't think I've ever seen three in a row...

/lgtm

openshift-ci-robot · 2018-12-11T15:15:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [crawford,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2018-12-11T17:48:53Z

/retest

Please review the full test history for this PR and help us cut down flakes.

crawford · 2018-12-11T18:13:12Z

4/5 is still pretty good. I knew I jinxed it with that last comment.

abhinavdahiya · 2018-12-11T18:30:54Z

Is there a way we can watch the cpu / memory utilization during the test run? @wking

wking · 2018-12-11T19:23:20Z

Is there a way we can watch the cpu / memory utilization during the test run?

CloudWatch has graphs; I forget if they charge to show memory or not. I just ran a loop to ssh into the masters in turn and run free -m. It seemed stable, but it's obviously not very fine-grained. And CI doesn't run for that long.

cuppett · 2018-12-11T19:26:10Z

CloudWatch shows things external to the VM like CPU and IOPS. To see memory or disk usage by filesystems within the VM in CloudWatch, most run their agent.

openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 10, 2018

openshift-ci-robot requested review from hardys and tomassedovic December 10, 2018 20:38

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2018

openshift-ci-robot assigned crawford Dec 11, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 11, 2018

openshift-merge-robot merged commit 6adb78a into openshift:master Dec 11, 2018

wking deleted the back-to-t3.medium branch December 11, 2018 18:56

crawford mentioned this pull request Dec 12, 2018

Revert "Revert "data/aws: Switch to m4.large"" #882

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "data/aws: Switch to m4.large" #858

Revert "data/aws: Switch to m4.large" #858

wking commented Dec 10, 2018 •

edited

Loading

crawford commented Dec 10, 2018

wking commented Dec 10, 2018

cgwalters commented Dec 10, 2018 •

edited

Loading

wking commented Dec 11, 2018

wking commented Dec 11, 2018

wking commented Dec 11, 2018 •

edited

Loading

crawford commented Dec 11, 2018

openshift-ci-robot commented Dec 11, 2018

openshift-bot commented Dec 11, 2018

crawford commented Dec 11, 2018

abhinavdahiya commented Dec 11, 2018

wking commented Dec 11, 2018

cuppett commented Dec 11, 2018

Revert "data/aws: Switch to m4.large" #858

Revert "data/aws: Switch to m4.large" #858

Conversation

wking commented Dec 10, 2018 • edited Loading

crawford commented Dec 10, 2018

wking commented Dec 10, 2018

cgwalters commented Dec 10, 2018 • edited Loading

wking commented Dec 11, 2018

wking commented Dec 11, 2018

wking commented Dec 11, 2018 • edited Loading

crawford commented Dec 11, 2018

openshift-ci-robot commented Dec 11, 2018

openshift-bot commented Dec 11, 2018

crawford commented Dec 11, 2018

abhinavdahiya commented Dec 11, 2018

wking commented Dec 11, 2018

cuppett commented Dec 11, 2018

wking commented Dec 10, 2018 •

edited

Loading

cgwalters commented Dec 10, 2018 •

edited

Loading

wking commented Dec 11, 2018 •

edited

Loading